Terminology Extraction from Comparable Corpora for Latvian

نویسندگان

  • Tatiana Gornostay
  • Anita Ramm
  • Ulrich Heid
  • Emmanuel Morin
  • Rima Harastani
  • Emmanuel Planas
چکیده

This paper presents the work on terminology extraction from comparable corpora for Latvian. In the first section we introduce our work; the second section briefly describes the concept of the project and the implemented general terminology processing chain; the following two sections focus on terminology extraction workflow for Latvian and evaluation of results, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora

The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bior multi-lingual text resources) which are much more widely available than parallel translation data. Our presented toolkit deals with parallel content extract...

متن کامل

Word Co-occurrence Counts Prediction for Bilingual Terminology Extraction from Comparable Corpora

Methods dealing with bilingual lexicon extraction from comparable corpora are often based on word co-occurrence observation and are by essence more effective when using large corpora. In most cases, specialized comparable corpora are of small size, and this particularity has a direct impact on bilingual terminology extraction results. In order to overcome insufficient data coverage and to make ...

متن کامل

Looking at Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction

The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced. However, the historical contextbased projection method dedicated to this task is relatively insensitive to the sizes of each part of the comparable corpus. Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpo...

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora

In this paper, we present an unsupervised hybrid model which combines statistical, lexical, linguistic, contextual, and temporal features in a generic EMbased framework to harvest bilingual terminology from comparable corpora through comparable document alignment constraint. The model is configurable for any language and is extensible for additional features. In overall, it produces considerabl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012